Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve client ID when using Kubernetes storage #1536

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

erwinvaneyk
Copy link
Contributor

This is more of a proposal.

Currently, when using the Kubernetes CRD storage, all clients IDs are encoded. Although this allows for a more extensive charset, it basically makes the client identifier unguessable/unconstructable without the DEX client. This is cumbersome to overcome if you are managing DEX through the CRDs rather through the DEX API.

So, I propose to not encode the client ID, since it doesn't seem like a necessity for the client IDs.

An alternative implementation would be to add an option to the Kubernetes storage config "PreserveClientID", which will only preserve the client ID if enabled.

@venezia
Copy link
Contributor

venezia commented Aug 30, 2019

Similarly to my comment in #1538 - you're not supposed to modify custom resources directly.

@srenatus - should modifying CRs directly be an option? Personally I think it would be akin to going off and doing random sql operations in a database for an app instead of using the app's intended mechanisms to make such changes.

@erwinvaneyk
Copy link
Contributor Author

erwinvaneyk commented Sep 2, 2019

Hey @venezia - thanks for the review (again)! 😜

Similarly to my comment in #1538 - you're not supposed to modify custom resources directly.

should modifying CRs directly be an option? Personally I think it would be akin to going off and doing random sql operations in a database for an app instead of using the app's intended mechanisms to make such changes.

I am struggling a bit here to see what actual constraints prevent us from striving to improve dex to support a more 'kubernetes-native' workflow.

It is great that dex supports CRDs as a config storage mechanism. But, it seems like a waste not to support the usual workflow of using CRDs. Technically, beyond this issue, the most dynamic parts of the dex config (clients and connectors) seem to support the CRD approach already. Adding/updating/removing clients through the CRD API is fully supported, and, the same goes for connectors.

There are some advantages to the CRD approach that the gRPC approach does not offer:

  • You can deploy dex + config in a single kubectl apply. With the gRPC API this is more cumbersome: we need to deploy dex, keep polling/waiting for it to be deployed, expose it, and then we need to use a custom client to configure dex.
  • You can view/edit the config using solely kubectl. Using the same (k8s) API (which enables us to use the same authn/authz, admission controllers, etc). No need to find and retrieve the custom client for dex.

I think, right now, the CRD implementation is just not ideal, as you pointed out in #1538 and others in #1308 and #1333. I would be happy to contribute further to help improve that integration.

@venezia
Copy link
Contributor

venezia commented Sep 2, 2019

I could see it a few ways @erwinvaneyk

On one hand, Kubernetes custom resources are not exactly a stellar storage vehicle (there are no user-defined indexes (I think its just apiVersion+kind), etcd just stores it as a json blob, under the hood, listwatch isn't that performant, k8s aims for "eventually consistent" but no atomic / transactions are provided, etc.) However, the Kubernetes API is a great take on things - declare something and controllers work to make it happen. On a personal note, I think Custom Resources make a lot of sense when you're trying to have them be used in a way that, in the end, results in Kubernetes primitives (like a etcd operator, for example)

In this situation, we're essentially storing data and the end result is not some controller converting our resources into core primitives, but rather we're simply storing application data. Storing such data in Kubernetes is not very different from storing your database in say, windows registry. You can do it, but I'm unsure if it is great. (And to be clear, I do do this, and again this is just my opinion, and I have seen people try to store 500kb resources instead of storing them as files - that's a lot of data pouring across the wire)

I think that for better performance you're likely going to want to use a different storage vehicle, as the documentation suggests. If nothing else than increased portability - its "hard" to move custom resources from one cluster to another cluster, but there are a lot of run books on how to replicate and migrate say MySQL to another environment.

I think leveraging Kubernetes custom resources is a great way to lower the "investment" users must make in order to try out dex. I think it works on a certain scale, but I'm unsure if you would use this setup for say 300k users, for all the aforementioned reasons.

The problem I have with embracing users using kubectl apply is that it blesses yet another vector for users to make administrative changes. I don't think we would want to encourage users to make haphazard sql changes if the storage backend was SQL, so what makes this different?

To your point with seeding a deployment with data (I believe you like the idea of kubectl apply resources and when the application is ready, its there, vs waiting for the application to be ready and then communicating with it) - this is a common problem of any application that leverages a database. There are ways to set up "static" connectors and the like (which has their own drawbacks)

But in the vernacular of kubernetes and helm, what would stop us from creating a job or post-install hook that would wait until things are working and then configure the application's database (in this case custom resources) using the grpc endpoint? Wouldn't that be a storage-driver-agnostic approach? And in doing so, we would be able to help users not just using Kubernetes custom resources, but also etcd, sql, etc.

To be clear, I like Kubernetes, I do many things in Kubernetes, but I also think that we should strive for solutions that work across all supported storage vehicles instead of just one. I'm also not necessarily against making things support kubernetes-native, but at the same time I'm unsure if it is a solid idea to support multiple vectors.

@erwinvaneyk
Copy link
Contributor Author

erwinvaneyk commented Sep 3, 2019

I completely agree with you, @venezia. I think we are on the same page w.r.t. the pros, cons, patterns, antipatterns of using CRDs. The CRD + operator 'pattern' is great, and also the main use case for CRDs. Dex indeed differs in that it uses CRDs less as "desired state" and more as just configuration storage.

I think the main thing that our perspectives differ on is the use case.

My use case for this is just a prototype/development. Right now, I care about is ease of use, minimizing complexity, and avoiding complex workflows (which seems to be main reasons why dex supports CRDs at all). For my prototype situation, the complexity outweighs performance and portability concerns that you pointed out (though those are indeed valid concerns at scale!)

I am looking at this from a very pragmatic point of view: Dex supports CRD-based storage. What I want is to be able to create and edit a client CRD at start and runtime. Dex, being stateless, supports this already, except for a single, minor issue, which this PR addresses. Fixing this issue does not open a new vector (at least not anymore than there already was).

I think I made a mistake of emphasizing the workflow/vector in this PR too much. 😉


The actual thing this PR is about is changing the resource name of the client to the client ID, which is more readable and easier to reference:

Before this change:

$ kubectl get oauth2clients
client/apw11euz23

After this change:

$ kubectl get oauth2clients
client/dex-frontend-app

Would purely this change be agreeable to you, @venezia ?

@venezia
Copy link
Contributor

venezia commented Sep 3, 2019

@erwinvaneyk

What about passwords.dex.coreos.com - these would still need to be hashed as kubernetes ids must match [a-z0-9]([-a-z0-9]*[a-z0-9])? and email addresses would definitely not conform to this expression.

Are you suggesting we should have hashes for the IDs of some custom resources (oauth2clients), but not others (passwords)? Will that inconsistency to resource naming not confuse someone even more?

I agree with you that lowering the amount of effort a user has to invest to administrate a dex installation is a great idea - but again, should we just create a storage-driver-agnostic mechanism to seed the data into the system through the front door? (the grpc API) I don't think you disagreed with using the grpc api, you just did not like having to wait for it to become ready - which is understandable, and why I suggested a job/post-install helm chart hook to do this.

FWIW, I was actually working on a dex administrative website to interact with the grpc api because I did want to make changes to dex easy. I initially was going to just do things through the custom resources directly and read the documentation discouraged modifying CRs directly - so we both want similar things, honestly. (though again, doing one that worked with grpc api would work across all storage mechanisms, which might make it a more useful-to-all option)

@erwinvaneyk
Copy link
Contributor Author

What about passwords.dex.coreos.com - these would still need to be hashed as kubernetes ids must match a-z0-9? and email addresses would definitely not conform to this expression.

Are you suggesting we should have hashes for the IDs of some custom resources (oauth2clients), but not others (passwords)? Will that inconsistency to resource naming not confuse someone even more?

It is already inconsistent right now. Only the OAuth2Clients and Passwords CRDs use encoded names, other CRDs (such as Connectors) do not encode the name. I personally think this change will improve the situation, since with Password there is a clear reason why it is encoded (email is used as the name) whereas with OAuth2Client it seems arbitrary.

I agree with you that lowering the amount of effort a user has to invest to administrate a dex installation is a great idea - but again, should we just create a storage-driver-agnostic mechanism to seed the data into the system through the front door? (the grpc API) I don't think you disagreed with using the grpc api, you just did not like having to wait for it to become ready - which is understandable, and why I suggested a job/post-install helm chart hook to do this.

No, indeed, I don't disagree. But, let's try scope this discussion a bit and focus on the contents of this PR. (see last part of the previous comment) 😁

FWIW, I was actually working on a dex administrative website to interact with the grpc api because I did want to make changes to dex easy. I initially was going to just do things through the custom resources directly and read the documentation discouraged modifying CRs directly - so we both want similar things, honestly. (though again, doing one that worked with grpc api would work across all storage mechanisms, which might make it a more useful-to-all option)

Unrelated to this PR, that sounds interesting to me. Are you planning to open-source it?

@venezia
Copy link
Contributor

venezia commented Sep 6, 2019

It is already inconsistent right now. Only the OAuth2Clients and Passwords CRDs use encoded names, other CRDs (such as Connectors) do not encode the name. I personally think this change will improve the situation, since with Password there is a clear reason why it is encoded (email is used as the name) whereas with OAuth2Client it seems arbitrary.

In that case, accept my apologies for being a nuisance, @erwinvaneyk . If we're already inconsistent, no harm done.

That said, do you think you could add backward compatibility / migration? I could be wrong, (and have been) but it looks like an existing user might have problems if they upgraded to a release with this code.

@erwinvaneyk
Copy link
Contributor Author

In that case, accept my apologies for being a nuisance, @erwinvaneyk . If we're already inconsistent, no harm done.

No worries at all @venezia - I prefer PRs being properly reviewed and scrutinized. 😉

That said, do you think you could add backward compatibility / migration? I could be wrong, (and have been) but it looks like an existing user might have problems if they upgraded to a release with this code.

Yes, we can. We could add a parameter for the kubernetes storage configuration, which, if set, preserves the client-ids as proposed in this PR.

I am just waiting a bit for further reviews/feedback on this change in general, before spending the time to actually implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants